Two-Stage Hashing for Fast Document Retrieval

نویسندگان

  • Hao Li
  • Wei Liu
  • Heng Ji
چکیده

This work fulfills sublinear time Nearest Neighbor Search (NNS) in massivescale document collections. The primary contribution is to propose a two-stage unsupervised hashing framework which harmoniously integrates two state-of-theart hashing algorithms Locality Sensitive Hashing (LSH) and Iterative Quantization (ITQ). LSH accounts for neighbor candidate pruning, while ITQ provides an efficient and effective reranking over the neighbor pool captured by LSH. Furthermore, the proposed hashing framework capitalizes on both term and topic similarity among documents, leading to precise document retrieval. The experimental results convincingly show that our hashing based document retrieval approach well approximates the conventional Information Retrieval (IR) method in terms of retrieving semantically similar documents, and meanwhile achieves a speedup of over one order of magnitude in query time.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Novel Method for Satellite Image Retrieval using Semantic Mining and Hashing

Satellite images play an important role for collection of geographical information. However, the use of such images is limited to a greater extent due to its retrieval complexities. Traditional methods using text have also failed to yield desired and time-saving results. Therefore, Content based image retrieval using high semantic features has been developed to overcome problems related to text...

متن کامل

Extensions to Self-Taught Hashing: Kernelisation and Supervision

The ability of fast similarity search at large scale is of great importance to many Information Retrieval (IR) applications. A promising way to accelerate similarity search is semantic hashing which designs compact binary codes for a large number of documents so that semantically similar documents are mapped to similar codes (within a short Hamming distance). Since each bit in the binary code f...

متن کامل

Fast Supervised Discrete Hashing and its Analysis

In this paper, we propose a learning-based supervised discrete hashing method. Binary hashing is widely used for large-scale image retrieval as well as video and document searches because the compact representation of binary code is essential for data storage and reasonable for query searches using bit-operations. The recently proposed Supervised Discrete Hashing (SDH) efficiently solves mixed-...

متن کامل

Near Duplicate Image Detection: min-Hash and tf-idf Weighting

This paper proposes two novel image similarity measures for fast indexing via locality sensitive hashing. The similarity measures are applied and evaluated in the context of near duplicate image detection. The proposed method uses a visual vocabulary of vector quantized local feature descriptors (SIFT) and for retrieval exploits enhanced min-Hash techniques. Standard min-Hash uses an approximat...

متن کامل

Locality Constrained Deep Supervised Hashing for Image Retrieval

Deep Convolutional Neural Network (DCNN) based deep hashing has shown its success for fast and accurate image retrieval, however directly minimizing the quantization error in deep hashing will change the distribution of DCNN features, and consequently change the similarity between the query and the retrieved images in hashing. In this paper, we propose a novel Locality-Constrained Deep Supervis...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014